Using Mixed Precision in Low‐Synchronization Reorthogonalized Block Classical Gram‐Schmidt

نویسندگان

چکیده

Mixed precision hardware has recently become commercially available, and more than 25% of the supercomputers in TOP500 list now have mixed capabilities. Using lower algorithms can be beneficial terms reducing both computation communication costs. There are many current efforts towards developing numerical linear algebra algorithms, which lead to speedups real applications. Motivated by this, we aim further state-of-the-art analyzing variants iterative methods. In methods based on Krylov subspaces, orthogonal basis is generated Arnoldi or Lanczos their variants. long recurrence such as GMRES, one needs use an explicit orthogonalization scheme Gram-Schmidt orthonormalize vectors generated. subspace methods, typically communication-bound modern machines; runtime usually dominated cost global synchronizations required for necessary orthogonalization. This motivated development various algorithmic attempt reduce while maintaining a stable algorithm. Recent work focused low-synchronization block orthogonalization, which, when used within communication-avoiding (s-step) number per block. this work, focus variant classical with reorthogonalization, call BCGSI+LS. We demonstrate that loss orthogonality produced exceed O(u)κ(

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixed-Precision Memcomputing

As the CMOS scaling laws break down because of technological limits, a radical departure from the processor-memory dichotomy is needed to circumvent the limitations of today’s computers. In-memory computing is a promising concept in which the physical attributes and state dynamics of nanoscale resistive memory devices organized in a computational memory unit are exploited to perform computation...

متن کامل

Mixed Precision Vector Processors

Mixed-Precision Vector Processors

متن کامل

Mixed Precision Training

Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyperparameters. This nearly halves memory requirements and, on recent GPUs, speeds up arithmeti...

متن کامل

Mixed quantum–classical dynamics

We present a uniÐed derivation of the mean-Ðeld (Ehrenfest) and surfacehopping approaches to mixed quantumÈclassical dynamics that elucidates the underlying approximations of the methods and their strengths and weaknesses. We then report a quantitative test of alternative mixed quantumÈclassical methods against accurate quantum mechanical calculations for a simple one-dimensional curve-crossing...

متن کامل

Classical mixed partitions

Through a method given in [3], a mixed partition of PG(2n− 1, q2) can be used to construct a (2n− 1)-spread of PG(4n− 1, q) and, hence, a translation plane of order q2n. A mixed partition in this case is a partition of the points of PG(2n− 1, q2) into PG(n− 1, q2)’s and PG(2n− 1, q)’s which we call Baer subspaces. In this paper, we completely classify the mixed partitions which generate regular...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings in applied mathematics & mechanics

سال: 2023

ISSN: ['1617-7061']

DOI: https://doi.org/10.1002/pamm.202200060